Optimizing Data - Parallel Stencil
نویسندگان
چکیده
We have developed a communication optimizer that concentrates on stencil communication patterns. This optimizer has been done in the context of the UNH C* compiler that targets distributed-memory MIMD computers. Our work has two distinguishing features: The compiler/optimizer is designed to be highly portable. We achieve this goal by providing eecient support for the optimizations in the run-time library. As well as performing aggregation for messages that share the same source and destination, we employ a specialized store-and-forward protocol that reduces the total number of messages initiated.
منابع مشابه
University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Diamond Tiling: A Tiling Framework for Time-iterated Scientific Applications
This paper fully develops Diamond Tiling, a technique to partition the computations of stencil applications such as FDTD. The Diamond Tiling technique is the result of optimizing the amount of useful computations that can be executed when a region of memory is loaded to the local memory of a multiprocessor chip. Diamond Tiling contributes to the state of the art on time tiling techniques in tha...
متن کاملOptimizing Transformations of Stencil Operations for Parallel Cache-based Architectures
This paper describes a new technique for optimizing serial and parallel stencil-and stencil-like operations for cache-based architectures. This technique takes advantage of the semantic knowledge implicitly in stencil-like computations. The technique is implemented as a source-to-source program transformation; because of its speci-city it could not be expected of a conventional compiler. Empiri...
متن کاملHalide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines Citation
Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. Because of their complex structure, the performance difference between a naive implementation of a pipeline and an optimized one is often an order of ...
متن کاملDomain-Specific Optimization of Two Jacobi Smoother Kernels and Their Evaluation in the ECM Performance Model
Our aim is to apply program transformations to stencil codes in order to yield the highest possible performance. We recognize memory bandwidth as a major limitation in stencil code performance. We conducted a study in which we applied optimizing transformations to two Jacobi smoother kernels: one 3D 1st-order 7-point stencil and one 3D 3rd-order 19-point stencil. To obtain high performance, the...
متن کاملMulticore-optimized wavefront diamond blocking for optimizing stencil updates
The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of caches to accelerate stencil updates and approach theoretical peak performance. A key ingredient is the reduction of data traffic across slow data paths, es...
متن کامل